Menu Top
Non-Rationalised Economics NCERT Notes, Solutions and Extra Q & A (Class 9th to 12th)
9th 10th 11th 12th

Class 11th Chapters
Indian Economic Development
1. Indian Economy On The Eve Of Independence 2. Indian Economy 1950-1990 3. Liberalisation, Privatisation And Globalisation : An Appraisal
4. Poverty 5. Human Capital Formation In India 6. Rural Development
7. Employment: Growth, Informalisation And Other Issues 8. Infrastructure 9. Environment And Sustainable Development
10. Comparative Development Experiences Of India And Its Neighbours
Statistics For Economics
1. Introduction 2. Collection Of Data 3. Organisation Of Data
4. Presentation Of Data 5. Measures Of Central Tendency 6. Measures Of Dispersion
7. Correlation 8. Index Numbers 9. Use Of Statistical Tools



Chapter 3 Organisation Of Data



1. Introduction

After learning how to collect data in the previous chapter, the next logical step is to understand how to organize it. The data we first collect, known as raw data, is often chaotic, disorganized, and difficult to comprehend, much like a pile of assorted items at a local junk dealer's (kabadiwallah's) shop.

Just as a kabadiwallah sorts his junk into categories like glass, plastic, and metal to manage his business efficiently, a statistician must organize raw data into a structured format. This process is called classification. The primary purpose of classifying data is to arrange it into groups based on common characteristics, which brings order to the information and makes it suitable for further statistical analysis and interpretation.



2. Raw Data

Raw data is the term used for data in its original, unorganized form, exactly as it was collected. This type of data is often large, unwieldy, and confusing. Trying to draw meaningful conclusions directly from a large set of raw data is an extremely tedious and often impossible task.

For instance, look at the following raw data representing the mathematics marks of 100 students.

474510605156661004940
60595655624859555141
42696466505957656250
64303775175620145590
62515514253490495654
70474982408260856566
49446469704812285565
4940254171800561422
66534670436159123035
45445776823932149025

From this table, it is difficult to quickly determine key information, such as the highest or lowest score, the average performance, or how many students passed. To make sense of this data, it must be organized and summarized through classification. This process makes the data comprehensible and allows for easy location of information, comparison, and inference.



3. Classification of Data

Classification is the process of arranging or organizing data into groups or classes based on some shared criteria or characteristic. The method of classification depends entirely on the purpose of the study. There are four primary types of classification:

  1. Chronological Classification: Data is classified based on time. The arrangement can be in ascending or descending order with respect to years, months, weeks, or any other time period. This type of data is also known as a time series.

    Example 1. Population of India from 1951 to 2011.

    Year Population (Crores)
    195135.7
    196143.8
    197154.6
    198168.4
    199181.8
    2001102.7
    2011121.0
  2. Spatial (Geographical) Classification: Data is classified based on geographical location, such as countries, states, cities, or districts.

    Example 2. Yield of Wheat for Different Countries (2013).

    Country Yield (kg/hectare)
    Canada3594
    China5055
    France7254
    India3154
    Pakistan2787
  3. Qualitative Classification: Data is classified based on descriptive characteristics or attributes that cannot be measured numerically. Examples include gender, religion, nationality, or literacy. The classification is done based on the presence or absence of an attribute.

    Example 3. Population classified by gender and marital status.

    This is a manifold classification, where the data is first divided into two groups (Male/Female) and then each group is further subdivided based on another attribute (Married/Unmarried).

    A tree diagram showing population being split into Male and Female, and each of those categories further split into Married and Unmarried.
  4. Quantitative Classification: Data is classified based on characteristics that can be measured numerically, such as height, weight, age, income, or marks. When such data is grouped into classes, it forms a quantitative classification, typically presented as a frequency distribution.


4. Variables: Continuous and Discrete

A variable is a characteristic that can be measured and whose value changes from one observation to another. Variables can be broadly classified into two types:

  1. Continuous Variable: A variable that can take any numerical value within a given range. This includes integers, fractions, and decimals. Its value can change in infinitely small gradations.
    • Examples: Height, weight, time, distance, temperature. A person's height does not jump from 150 cm to 151 cm; it passes through every possible value in between, such as 150.1 cm, 150.11 cm, etc.
  2. Discrete Variable: A variable that can only take specific, distinct values and "jumps" from one value to the next without taking any intermediate values. These are typically values that can be counted in whole numbers.
    • Examples: The number of students in a class, the number of cars on a road, or the number appearing on a rolled dice. You can have 25 or 26 students, but not 25.5 students.


5. What Is a Frequency Distribution?

A frequency distribution is a concise and comprehensive way to classify the raw data of a quantitative variable. It is a table that organizes data by grouping it into classes and showing the number of observations (frequency) that fall into each class.

Key Components of a Frequency Distribution:

How to Prepare a Frequency Distribution?

Constructing a frequency distribution involves making several key decisions:

  1. Should class intervals be equal or unequal?

    Equal intervals are generally preferred for simplicity and ease of comparison. However, unequal intervals are used when data is highly skewed, such as income data, where a few individuals may have very high incomes. Using equal intervals in such cases would either create too many classes or mask important details at the lower or upper ends.

  2. How many classes should there be?

    There is no fixed rule, but typically, the number of classes is kept between 6 and 15. Too few classes can hide important patterns, while too many can be as confusing as the raw data itself.

  3. What should be the size of each class?

    The size (or width) of the class interval is linked to the number of classes and the range of the data (Range = Largest Value - Smallest Value). Once the desired number of classes is decided, the approximate size of the class interval can be found by dividing the range by the number of classes.

  4. How should class limits be determined?

    This involves choosing between two methods:

    • Inclusive Method: Both the lower and upper limits of a class are included in that class itself (e.g., 0-10, 11-20, 21-30). This method is often used for discrete variables. A notable feature is the "gap" between the upper limit of one class and the lower limit of the next.
    • Exclusive Method: The upper limit of a class is excluded, and any observation equal to the upper limit is included in the next class (e.g., 0-10, 10-20, 20-30). In the class 10-20, values from 10 up to (but not including) 20 are included. This method is preferred for continuous variables as it ensures continuity in the data.
  5. How is the frequency for each class found?

    This is done by going through the raw data and using tally marks to count how many observations fall into each class. A tally mark (/) is placed against a class for each observation. For ease of counting, tallies are grouped in fives: four vertical lines and a fifth line crossing them diagonally (////). The total number of tally marks for a class is its frequency.

Adjustment in Class Interval

When using the inclusive method for a continuous variable, the "gap" between classes needs to be removed to ensure continuity. This is done through an adjustment:

  1. Find the difference between the lower limit of a class and the upper limit of the preceding class. (e.g., in a series 800-899, 900-999, the difference is $900-899 = 1$).
  2. Divide this difference by two (e.g., $1 \div 2 = 0.5$).
  3. Subtract this value from all lower limits and add it to all upper limits. (e.g., 800-899 becomes 799.5-899.5).

Loss of Information

A major drawback of classifying data into a frequency distribution is the loss of information. Once raw data is grouped into classes, the individual values of the observations are lost. All subsequent calculations are based on the class mark, which is an assumed representative value for all observations in that class. This is an approximation, but it is a necessary trade-off for making large datasets comprehensible and manageable.

Frequency Array

For a discrete variable, the classification of its data is called a Frequency Array. Instead of grouping data into classes, this is a simple table that lists each distinct value of the variable and its corresponding frequency (how many times it appears in the dataset).



6. Bivariate Frequency Distribution

Sometimes, we collect data on two variables simultaneously for each element of a sample (e.g., collecting both the height and weight of each student). This is known as bivariate data. To summarize such data, we use a Bivariate Frequency Distribution.

This is a two-way table where the classes of one variable are arranged in rows and the classes of the other variable are arranged in columns. Each cell in the table shows the joint frequency—the number of observations that fall into that specific row and column class simultaneously. This type of table is also known as a contingency table and is crucial for studying the relationship between two variables, such as correlation.

Ad Expenditure
(in '000 ₹)
Sales (in Lakh ₹) Total
115–125 125–135 135–145 145–155 155–165 165–175
62–64213
64–66314
66–6812115
68–701124
70–72114
Total25631120

In this table, for example, there is 1 firm whose advertisement expenditure is between ₹64-66 thousand and whose sales are between ₹135-145 lakh.



7. Conclusion

Data collected from primary and secondary sources is initially raw and unorganized. To make this data useful for statistical analysis, it must first be classified. Classification is the process of organizing data into a structured format, most commonly a frequency distribution.

This process brings order to the data, making it concise and comprehensible. Understanding the techniques of classification and how to construct a frequency distribution for both continuous and discrete variables is a fundamental skill in statistics.



Recap



Exercises

This section contains questions for practice and self-assessment, designed to test the learner's understanding of the concepts discussed in the chapter, such as defining class midpoint, distinguishing between different types of distributions, creating frequency distributions from raw data, and understanding the concept of 'loss of information'.



Suggested Activity

This section provides ideas for practical projects, such as analyzing one's own past examination marks to see if the marks constitute a variable and to track improvement over time.